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Abstract — We consider the joint estimation of multipath chan- 
nels obtained with a set of receiving antennas and uniformly 
probed in the frequency domain. This scenario fits most of the 
modern outdoor communication protocols for mobile access |1| 
or digital broadcasting | 2 | among others. 

Such channels verify a Sparse Common Support property 
(SCS) which was used in |3) to propose a Finite Rate of 
Innovation (FRI) based sampling and estimation algorithm. In 
this contribution we improve the robustness and computational 
complexity aspects of this algorithm. The method is based on 
projection in Krylov subspaces to improve complexity and a new 
criterion called the Partial Effective Rank (PER) to estimate the 
level of sparsity to gain robustness. 

If P antennas measure a A'-multipath channel with N 
uniformly sampled measurements per channel, the algorithm 
possesses an O(KPNlogN) complexity and an O(KPN) mem- 
ory footprint instead of 0{PN 3 ) and 0{PN 2 ) for the direct 
implementation, making it suitable for K « N, The sparsity is 
estimated online based on the PER, and the algorithm therefore 
has a sense of introspection being able to relinquish sparsity if it 
is lacking. 

The estimation performances are tested on field measurements 
with synthetic AWGN, and the proposed algorithm outperforms 
non-sparse reconstruction in the medium to low SNR range 
OdB), increasing the rate of successful symbol decodings by 
l/10 th in average, and l/3 rd in the best case. The experiments 
also show that the algorithm does not perform worse than a non- 
sparse estimation algorithm in non-sparse operating conditions, 
since it may fall-back to it if the PER criterion does not detect 
a sufficient level of sparsity. 

The algorithm is also tested against a method assuming a 
"discrete" sparsity model as in Compressed Sensing (CS). The 
conducted test indicates a trade-off between speed and accuracy. 

Index Terms — Channel Estimation, Sparse, Finite Rate of 
innovation, Effective Rank, Krylov Subspace. 

I. Introduction 

COMMUNICATIONS between two parties are subject to 
unknowns: noise and filtering by the Channel Impulse 
Response (CIR). With respect to decoding, the noise is treated 
as nuisance parameters, and the CIR coefficients as unknowns 
to be estimated as precisely as possible to maximize the 
decoding rate. For this purpose, the channel can be used to 
transmit a signal known at both ends — the pilots — to gain 
knowledge about the CIR. 

It dictates a trade-off between the portion of the channel 
reserved to the pilots — thus lost to data — and the decoding 
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a) block layout b) comb layout c) scattered layout 

Fig. 1. In pilot assisted OFDM communications, time-frequency slots are 
reserved (black) for pilots, thus providing a sampling of the CIR in time and 
frequency. 




Fig. 2. The ideal SCS channel model is a set of P channels of bandwidth B 
each having K components aligned in time. Assuming complex valued signal 
components, the total number of unknowns is (2P + l)K instead of 3PK 
for a sparse model with independent time of arrivals (ToA), or 2P times the 
Nyquist Rate for a bandlimited model. 

error rate due to bad channel estimation, both affecting the 
communication bitrate. 

The two interdependent aspects of channel estimation are 
therefore the selection of pilots and the design of an estimation 
algorithm. We focus on the later and use uniform DFT 
pilot layouts shown in Fig. Q] which are found in modern 
communication standards using OFDM. The pilots provide 
information about the CIR, and so does an priori knowledge 
about its structure. 

In the noiseless case, the CIR can be perfectly recovered 
with a finite set of samples if it perfectly obeys the a priori 
known structure, thus providing a sampling theorem — e.g. 
uniform pilots in time at the Nyquist rate characterize uniquely 
bandlimited signals. 

In this paper we study Sparse Common Support (SCS) 
channels, i.e. channels sharing a common structure of very 
low-dimension. Fig. [2] shows an example of SCS channels. 

A. Problem definition 

Imposing a structure may not lead to a trivial linear system 
of equations as it is the case for Shannon/Nyquist (projection 
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in a linear subspace). The SCS structure is a union of K 
unidimensional subspaces ||4l, J5] shared by P channels which 
leads to an estimation problem exactly and efficiently solvable 
in the framework of Finite Rate of Innovation sampling 0. In 
|3j an estimation algorithm SCS-FRI is proposed and studied. 

This leads to two questions. First, as seen in Fig. [3ja), the 
SCS-FRI algorithm selects the K subspaces from an infinite 
and uncountable seQ which could be sampled. This is the 
approach taken by Compressed sensing (CS) or the sparse 
representation literature. The limiting case is to reduce the 
set to form a basis as shown in Fig. |3]b). The CIR can 
be perfectly and uniquely represented with elements of this 
set, but it probably won't have a X-sparse representation. A 
possible trade-off is to enlarge the set to form a frame as in 
Fig- 0c). Reconstruction becomes more complex, but the shift- 
invariance provided by the frame [6 1 gives a better X-sparse 
representation. 

Second, models are an approximation of reality — e.g. 
Fig. H] — and it is not immediately clear which amount 
of modelization error can be tolerated in practice. This is 
especially important as we rely on a very specific signal 
structure, and this question can only be answered by trials 
on field measurements. 



B. Contributions 

With these considerations in mind, we adress the estimation 
of Sparse Common Support (SCS) channels from DFT-domain 
measurements (pilots). SCS channels estimation with FRI was 
studied in a previous paper J3J, and we hereby focus on com- 
putational issues and robustness. A fast algorithm is derived 
based on Krylov subspace projections. Its main advantage is 
to have a computation and memory cost proportional to the 
sparsity level K. It relies on FFT evaluations for the heavy load 
computations, which is particularly appealing for embedded 
DSP applications. 

The sparsity level is unknown in practice, and we derive 
a heuristic estimate of it using a new measure called the 
Partial Effective Rank (PER). The PER tracks the "effective 
dimension" of the Krylov subspace as its size is increased, 
and can therefore be estimated online unlike other information 
criteria such as Akaike, MDL Q or the EDC JS]. 

To assess the performances of the proposed algorithm, sim- 
ulations are performed on measured CIRs to which synthetic 
AWGN is added. We compare the obtained results with an 
algorithm exploiting discrete sparsity to see if the input shift 
sensitivity described in Fig. |3jis a practical issue. 

C. Outline 

First, we will explicit the SCS channel model for aerial 
electromagnetic transmissions, and see under which conditions 
this model is relevant. Our conclusion is that the SCS property 
may not always be verified, which is confirmed by the data 
shown in Fig. [4] It establishes the requirements for a robust 
SCS estimation algorithm aware of the operating conditions. 

1 notwithstanding the limitations of machine precision. 



a) Evolution of CIR over time 
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b) From "sparse" to "not so sparse" in 20s 




Fig. 4. Field measurements collected in (£)• The receiver is a base-station 
with P — 8 antennas, and the transmitter is mobile. The image a) shows 
the magnitude of the first antenna's CIR. The channel is qualitatively sparse 
except when the transmitter goes through a tunnel. The real part of the CIR 
for three different antennas is shown in b) confirming the common support 
property and the transient nature of sparsity. 



Second, we quickly review the application of FRI and 
discrete sparse representations to the SCS estimation problem. 
Details are left out since they are already present in the 
literature OH, 0, flj]- 

The third part is devoted to the design of a fast and robust 
algorithm for the SCS estimation problem, based on Krylov 
subspace projections with 0(KPN \og(N)) operations, re- 
quiring O(KPN) memory, where K is the number of sparse 
components, P is the number of channels, and N is the 
number of collected samples per channel. For robustness, we 
introduce the Partial Effective Rank (PER) derived from the 
work of Roy 1121 . which only adds a marginal 0(K 2 ) cost 
to be evaluated online. The PER is used to derive a heuristic 
estimate of K requiring little overhead. The heuristic may fail 
if the channel is not sparse enough, giving a beneficial sense 
of adequacy to the algorithm. In such non-sparse cases, the 
algorithm can yield to a non-sparse estimation method. 

The resulting algorithm — combining FRI sampling, the 
PER criterion and Krylov subspace projection — named FRI- 
PERK was implemented in MATLAB and compares favorably 
to FRI for N > 200. 

We conclude with the application of FRI-PERK and a CS 
algorithm (RA-ORMP) to field measurements collected in 
l9l with synthetic AWGN, and compare them with a simple 
non-parametric method (spectrum lowpass interpolation). The 
results show the SCS property can be exploited in the medium 
to low SNR bracket (below OdB) to significantly lower the 
Symbol Error Rate (SER). 

Compared with the non-parametric method, FRI-PERK in- 
creases the proportion of correctly decoded symbols by 1/10 
in average, and 1/3 at best. 
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Fig. 3. Consider a single pulse ( , dashed curve) critically sampled as shown on the left column. From the samples (-x, stems), 1-sparse representations 

of the original signal are computed. In a), the problem is treated as a parametric estimation of to and Co as in the FRI framework. Conceptually, the signal 
component is chosen from the infinite and uncountable set of the pulse shape and its shifts in [0, 1[. The original signal has a perfect 1-sparse representation 
in this setup. In b), the signal component is chosen in a finite set of functions forming a basis of the signal space. The original signal can be represented by 
these functions, but it does not have a 1-sparse representation in general. In c), three times more synthesis functions are added to the set, to form a frame. The 
signal has a much closer 1-sparse representation in this frame thanks to the shift invariance introduced by the redundancy between the synthesis functions, but 
the estimation becomes combinatorially more complex. The estimation frameworks b) and c) are referred as discrete sparsity which is used in Compressed 
Sensing (CS), and the estimation is subject to a trade-off between accuracy and complexity. 



II. The physics of sparse common support channels 
A. Sparsity beyond simulations 

Algorithms for the estimation of a sparse signal from 
noisy measurements are quite mature iTPJl . Ifl4l . J3]. The 
principal hurdle for their application to physical signals is the 
inadequacy between the simple theoretical model and reality. 

For example, indoor electromagnetic channels are in general 
not sparse, as described in the Saleh and Valenzuela model 
[fl~5l . In this model, reflections are bundled in clusters having 
an exponentially decaying energy. This dynamic holds in 
general fl6l . ifTTl . and after demodulation, an electromagnetic 
channel impulse response h(t) can be described as the super- 
position of clustered reflections, 

K 

= E E w(t - t fc - Aj), 

k=l (Ai,Ai)eC k 
K 

I>~^' fc E A ie -^ A '0(e^), (1) 

fc=l (A,,A,)eC k 

such that ip is the channel mask in the time domain (after de- 
modulation), and Ck are clusters of reflections with a delay tk- 
Within each cluster we observe reflections shifted by A; from 
tk with randomly distributed complex-valuecQ amplitudes Ai . 

The number of clusters K is usually small, but the total 
number of reflexions is not. However, if the bandwidth O v of 
ip and the maximal intra-cluster delays Aj are small enough, 
the th order approximation 

2 After demodulation, the equivalent baseband channel becomes complex- 
valued. 



holds at all considered frequencies we]-jr, it]. If in addition 
the amplitudes are distributed identically and independently 
enough, one of the many avatars of the central-limit theorem 
may be used to obtain a simplified model known as the 
"Multipath Rayleigh channel model": 

K 

h(t) = ^ C k <p(t - t k ), C k ~ A/c (0, c 2 k l) . (2) 
fe=i 

The channel is called sparse if and only if the expected 
delay-spread r upper-bounding tx — ii verifies 

k/t « r^/271-, 

i.e. if the rate of innovation is substantially lower than the 
Nyquist rate. 

The requirements for sparsity are thus two-folds: 

• The "girth" of each cluster must be a fraction of the 
inverse bandwidth of the channel. 

• The density of clusters must be a fraction of the channel 
bandwidth 

The first property hints at channels with a low or medium 
bandwidth, while the second one requires long-distances prop- 
agation where both the transmitter and receivers are in a 
relatively clear environment, such as outdoor communications. 

The high velocity of electromagnetic (EM) waves ensures 
scatterers of large dimensions, such as trees, generate clusters 
of modest time spread allowing the use of the simplified 
model d2J instead of (Q~|i for channels with a bandwidth up 
to 100MHz approximately. Fig. summarizes the conditions 
necessary for the channel to be sparse. The channel shown in 
Fig. Hhas a bandwidth of 120MHz, and an EM wave travels 
2.5m in a time-lapse equal to the inverse-bandwidth. There- 
fore, in typical noisy operating conditions, reflections having 
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Short delay-spread r 



Long delay-spread r 




Fig. 5. This figure shows how sparsity relates to the channel bandwidth 
and its delay-spread. All four panels (a)-(d) have the same number of signal 
components — 80 of them grouped in 8 clusters with exponentially fast energy 
decay. Signals (a) and (c) cannot be considered sparse as the rate of innovation 
is close to/greater than the Nyquist rate on the time-lapse corresponding to 
the delay-spread, and outside it. Signal (b) is weakly sparse, the rate of 
innovation is for this reason also high. In this setup the discrete sparsity 
approach may be suitable. The signal (d) can be considered sparse as only the 
8 clusters will be resolvable in the presence of noise. The rate of innovation of 
this approximation is much lower than the Nyquist rate. Even though (b) and 
(d) have the same rate of innovation in a strict sense, (d) can be approximated 
with a signal having 1/10* h the rate of innovation of (b) thanks to its low 
bandwidth. This approximation motivates the use of a model with a low rate 
of innovation in the low-SNR regime where the model approximation error 
has less power than the noise. 



a path difference of a fraction of 2.5m will be unresolvable. 

On the other hand the ratio K/t will be quite large 
necessitating long range outdoor transmissions for the channel 
to be considered sparse. 

Fig-Hshows the magnitude impulse responses of an outdoor 
channel with a medium bandwidth from the FTW0 MIMO 
dataset |9|, and confirms this analysis. 

Note that we do not rule-out the occurrence of sparsity 
in ultrawide-band communications as the channel dynamics 
become quite different at very short range ifTcVl . 



B. Common support 

We now consider the case where a receiver (Rx) possess 
several antennas, as in SIMO and MIMO communications. 
Therefore, if we consider a 1-to-P communication verifying 
the multipath model (fJJ, the receiver observes P channels 

K 

M*) = Zj C k, P v(t-t k ,p), p = l,...,P, 

k=l 

for a total of 3KP unknown coefficients parametrizing the 
channels. 

As for sparsity, the high velocity of EM waves is again a 
crucial factor to establish the common support property, and 



the difference in amplitudes 

tk,i * ife,2 tk,p, 
Chi ¥= Cua Ck,p- 



(3) 
(4) 



Indeed, if d max is the maximal distance between the P 
receiving antennas, the ToA difference between antennas is 
upper-bounded by 2d max /c.The criterion for common support 
© is 

A — 



In Fig. |U d max = 60cm and 7r jp = 1.25m. 

To assert ©, we need to quantify the spatial correlation of 
paths amplitudes between the receiving antennas. Using Q 
(Proposition 6), and assuming narrow scatterers, the correla- 
tion between antennas separated by a distance d m . n can be 
approximated as 



E 



[|C fc , m | 2 ]E[|C7 fc ,,„| 2 ] 



m^^carncr 



/c). 



Using the numbers of Fig. [5] correlation between antennas 
does not exceed 0.5. In most communication scenario (|4]i is 
verified by design as it provides spatial diversity lfP9l . 

In conclusion, we see that the common support property is 
relatively easy to establish as it only depends on the antennas 
topology. This is not the case for sparsity, in Fig. |4] when 
the transmitter enters a tunnel, a dense train of reflections is 
observed. Hence, it is reasonable to say outdoor communica- 
tions with medium bandwidth have typically — as opposed to 
"surely" — sparse channel impulse responses. 

III. Estimation of sparse common-support 

CHANNELS FROM DFT PILOTS 

Assume each frame of period ry is uniformly sampled in 
time 

x p [n] = x p (riTf/Nf) , n = 0, . . . , Nf — 1. 

The pilot layouts in Fig. [T] lead to estimate a set of sparse 
common support channels from a uniform subset of its DFT 
coefficients. 

To make notation less cumbersome, we assume without loss 
of generality that Nj = 2MD + 1 , and use negative indices 
which shall be understood "modulo the index limit" in general. 
Also the pilot index is centered on the DC carrier 



V 



-AID, 



-2D, -D, 0, D, 



MD) 



and is supposed to take the value 1 . The total number of pilots 
is N = 2M + 1. 

Each frame is periodically padded to ensure the convolution 
with the CIR is circular, thus 

x p = diag (lp) hp, 

x p = Wdiag(lp) W H h p , 



Forschungszentmm Telekommunikation Wien, 



urements . f tw . at 



such that W is the DFT matrix, 1-p is the indicator of V, 
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and 



K 



nt k /T e 



(5) 



fc=i 



is the sampled SCS channel model in the DFT domain. 

The operator VFdiag (1-p) W H is the orthogonal projection 
in the subspace spanned by the basis vectors of the DFT 
corresponding to the pilot indices. 

With a good synchronization, the ToAs are contained in 
] - r/2, r/2], and if 



I — | , DeN, 



(6) 



the original spectrum h p can be recovered by ideal lowpass 
interpolation 



W H h p = C D diag(l v )W H x p , 

such that Cd is circulant with entries 

sin(7r(m — n)/D) 



(7) 



[Ci 



n sm(n(m-ri)/Nf) 

This principle is at the core of OFDM communications which 
interleave data and pilots in the DFT domain. 

If the measurements x p are corrupted by AWGN, this 
technique projects orthogonally the measurements in the signal 
subspace and therefore minimizes the estimation MSE if 
common-support property or sparsity are ignored. 

A. FRI approach 

The channels in (0 do not lie in linear subspaces, but in 
a common union of subspace. Therefore the channels can be 
estimated in two-steps 

1) Identify jointly the K subspaces (the common support). 

2) Compute the orthogonal projection of the measurements 
in the union of subspaces separately for each channel. 

Algorithms and analysis for this problem are found in [jjj, and 
are a simple extension of well-known linear array-processing 
techniques such as ESPRIT [201 or the annihilating filter ||2T1 
to common-support channels. 

The union of subspaces is identified by studying the column- 
space of the following data matrices 

~ h p [0] h p [-D] h p [-2D] ■■ " 
h p [D] h p [0] hp[-D] •• 
h p [2D] h p [D] h p [0] 



of dimensions (M + 1) x (M + 1) 

These data matrices have a Vandermonde decomposition 



such that V 



JVdiag(C Pi i, 
1 

\Y Dtl l Ts 



.., C p ,k)V h , 
1 

W 2DtK l T 



is an (M+ 1) x K Vandermonde matrix and J is the exchange 
matrix. This decomposition has the merit to clearly show T p 
has a column-space of dimension K which depends only on 
the support, and is thus the same for each channel. Moreover 
any basis for this column-space verifies a rotation invariance 
property. Indeed, let 



V 



Mi 



(M-l), 



and V 



1 d ±f 



[V] 



2:M,: 



Then 



yt = yiv£r ; * = diag (w Dtl/r ° , W DtK,T ^j . 
Any basis V having the same span as V can be written as 



def 



VA, where A is a fullrank K x K matrix, therefore 
=V T A, 



which means the support is recovered from any basis V of 
the column-space of T p as the phase of the eigenvalues of the 



solution to 



V l X , 



which is the ESPRIT algorithm 12011 . In the presence of 
AWGN, the basis V may be obtained in a robust manner 
from the SVD of the stacked matrix 



(8) 



but other subspace identification techniques may be used 
depending on the measurements model. 

To compare with the canonical estimation technique 
this method not only uses the limited-length of the delay- 
spread but also sparsity and common-support. The solution 
of the estimation problem is to be found in a smaller set of 
candidates. 

We expect that with good algorithms the estimation will 
be more resilient to noise. On the other hand more restrictive 
models yield higher modelization error. 

Note that sparsity is not used to reduce the number of pilots 
but to make the estimation more robust to noise. See J3 for 
pilot reduction. 

B. CS approach 

The measured DFT samples x p corresponding to the set of 
pilot subcarriers V are linked to the channel impulse response 
coefficients by 



X = [W] rc H 
X d = f [[xfo, . . 



(9) 



[x P ]-p] 



H [ hli 



, h P ] 



where C is the index set on which the channel impulse 
responses are supported a priori. C contains contiguous indices 
covering a time-lapse less than or equal to the delay-spread. 
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The matrix of channel coefficients H is assumed to be jointly 
row-sparse, i.e. only a few rows of H are not null. 

This is known as the MMV (Multiple Measurement Vectors) 
problem which is not a trivial extension of the SMV (Single 
Measurement Vector) problem. It has received a lot of attention 
in the past few years and good, efficient algorithms exist such 
as RA-ORMP flj], Lee & Bressler fl4l.... 

First notice that the channel model described in (0 is not 
jointly row-sparse nor sparse, unless the ToAs coincide with 
the sampling points. 

Second, since (O holds, the system (0 is invertible, which 
is typically not the case in compressed sensing. In this study 
sparsity is used as a regularization technique to gain robustness 
to noise as in Section UlI-AI One could select a random subset 
of pilots to effectively compress the channel measurements 
leaving additional space for data. 

The compressed sensing literature on sparse channel esti- 
mation has followed two tracks 

1) The ToA coincide with the sampling grid. 

2) The ToA do not coincide with a regular grid resulting 
in "approximately sparse" channels ifTTI . |[T0l . El . It 
introduces a model mismatch which can be mitigated 
using frames (increased estimation complexity). 

The first assumption is remote from reality and the reported 
performances shall be taken with a grain of salt, nevertheless 
it is useful to benchmark algorithms. The second one is more 
relevant and its limitations due to model mismatch may fade 
away when applied to real-world data where it is unavoidable 
anyway. 

IV. Application of the FRI approach: PERK 

In this section, we consider the number of signal compo- 
nents K and the number of antennas P to be small compared 
to the number of pilots N . 

The FRI based channel estimation technique outlined in 
Section IIII-AI has two shortcomings if implemented in a 
straightforward manner 

1) Its computational complexity and memory footprint are 
respectively 0(N 3 ) and 0(N 2 \ Both are contributed 
by the SVD decomposition used to estimate the column- 
space of T. 

2) The sparsity level K is unknown. 

The complexity [TJ is especially important for channel esti- 
mation as it is a core signal processing block at the receiver's 
physical layer. It is called on several times per second, and 
must operate in real-time with limited power and hardware 
resources. 

A. An O (KPN log N) FRI estimation 

Naive computation of an SVD of T defined in [8] to obtain 
a A"-dimensional subspace of the column-space is wasteful 
for two reasons. First, only K out of M + 1 principal singular 
pairs (<r m , v m are of interest. Second, T is well structured — 
made of Toeplitz blocks — and matrix factorization techniques 
such as QR will destroy this structure during the factorization 
process, rendering it unexploitable and requiring an explicit 
storage of the data matrix. 



Since we are interested in the column-space of T, we will 
work on the hermitian symmetric correlation matrix 

p 

T H T = = ^ T%T P , 

p=i 

which has eigenpairs (a m , v m , m = 0, . . . , M. 

A solution to compute only the leading eigenpairs, is to 
project the correlation matrix in a Krylov subspace 11231 . This 
is an iterative method in which computations are performed on 
the original matrix, meaning the original structure is preserved. 
Hence the memory footprint is kept low and the computational 
complexity is similar for each iteration. 

Krylov approximants have received a lot of attention in the 
numerical analysis 11231 . 11241 . (25), ll26l or control theory l27l . 
Il28l literature, therefore we will only quickly skim through the 
subject. 

Projection into a Krylov subspace is done with Lanczos 
algorithm. A comprehensive analysis was done by Xu fl29l 
for the estimation of covariance matrices in linear array 
processing. He proposed an 0(N 2 ) algorithm^ together with 
an approximate 0(N 2 ) estimation of the subspace dimension 
K. Implementation of the Lanczos algorithm is quite involved 
in practice and may require costly corrections at each iteration 
ll26l , 11231 , ll24l . this is why asymptotically more expensive 
0(N 3 ) are generally preferred unless the system is very large. 

The additional structure on the original data matrix T allows 
us to lower the complexity from 0(N 2 ) to O(NlogN), 
making it appealing even for matrices of modest size, and we 
will derive a novel criterion to estimate the signal subspace 
dimension K which requires 0(K 2 ) computations to be run 
along the subspace estimation process. 

A A"-dimensional Krylov subspace JC of a A/-dimensional 
hermitian matrix A has is 

^K,f(A) = sp&n k=1 K A k f, 

where / is an initial vector which can be randomly chosen. 
Note that all the properties of Krylov subspaces only hold in 
probability, an "unlucky" initial draw may compromise them. 

The fc th basis vector A k f is a monomial of A of degree k, 
therefore using a three terms linear recursion on this sequence 
of monomials one can derive a sequence of orthogonal polyno- 
mials (33). This is equivalent to say that an orthonormal basis 
Q K of K,K,f{A) is computed by orthogonalization of each of 
A k f only against the two previous ones and normalization. 
So, the main cost of the procedure is the computation of 
the non-orthogonal basis vectors, which is done by recursive 
matrix-vector multiplications. 

The three terms recursion used to orthogonalize the Krylov 
basis implies that JCkj(A) has a tridiagonal decomposition 

V f , K A = Q^TkQk 

where Q K is unitary and Tk is symmetric and tridiagonal 
(thanks to the 3-terms recursion). The eigenpairs of Vf.KA 
are derived from this factorization at little cost 11231 . 11241 . and 
they are called the Ritz pairs. 

4 K is considered small and constant compared to N and P = 1. 
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TABLE I 

APPROXIMATIVE "O" COMPLEXITY FOR SUBSPACE IDENTIFICATION. 



Algorithm 


Main computation 


Storage 


Latency 


Processing units (pu) 


Krylov + PER 


KPN log N 


K(N + 1) 


KN (30) 


P x FFT engines (N + 1 points) 


Full SVD (serial) 


PN 3 


PN 2 


PN 3 


1 multipurpose pu. 


Full SVD (systolic array) (5T], (32] 


PN 3 


PN 2 


N{logN + P) 


N 2 x 2-by-2 SVD pu. 



The full SVD is done with Jacobi rotations and can be massively parallelized using the systolic array method of Brent, 
Luk & Van Loan 1311 - Parallelism greatly reduces the latency of the system, but since it does not reduces the number 
of computations it comes at the cost of using multiple processing units. 



A remarkable property is that the Ritz pairs quickly con- 
verge to the principal eigenpairs of A as K grows. This quick 
convergence does not follow exactly the result of Xu ||29l — 
since in our case the Toeplitz blocks are of approximately 
square size — but it can be explained with a theorem of Saad 
l23l (Theorem 12.4.1) which links the rate of convergence of 
the Ritz pairs to the growth rate of Chebyshev polynomials. 
The theorem shows that Ritz pairs converge faster to the 
corresponding eigenpairs if the eigenvalues are farther apart. 

Because of the Toeplitz structure of the data matrix, matrix- 
vector multiplications with T H T, which is the central step of 
a Lanczos iteration has a cost of 0(PN log(iV)). Indeed, 

p 

T H Tf = ^ TfT p f 

P =i 

is the sum of P matrix-vector multiplications, each of them 
realized as two consecutive Toeplitz maxtrix-vector multiplica- 
tions. Square Toeplitz matrices of dimension M+l can be em- 
bedded in circulant matrices of dimension 2 (M + 1) = N + 1 



T 
T 



def 



p — toeplitz (tp.. 



■Mi 



i tp,0i 



p - toeplitz . . . ,t P! M,0,t : 



p-M 



l). 



Circulant matrices are diagonalized by the DFT matrix, 
hence the cost of a circulant matrix-vector multiplication is 
dominated by the cost of 4 FFT. 

Since 



M 



I'M 



c 



H 



Hi 



c 



H 



T H Tf, 



each Lanczos iteration has a cost dominated by AP FFT0 
of length N + 1. Since only the first half of the input and the 
first half of the output of each FFT are non or not needed, 
each FFT could be replaced by two FFT of half the size. 

The resulting algorithm fulfill all the requisites for a fast 
embedded implementation since most of the computational 
load is put on the FFT block, ubiquitous in communication 
systems. Table [Q 

B. On-line sparsity assessment 

In this section we introduce the Partial Effective Rank 
(PER) a criterion to estimate the signal subspace dimension 

5 The DFT of the circulant generators can be precomputed. 



working online with the Lanczos algorithm. Its main advantage 
compared to other methods is to require little information 
about the noise space. A formal study of the PER is deferred 
to an upcoming report. 

1) Shortcomings of traditional information criterions: 
Information theoretic criteria such as Rissanen's MDL flT], 
Akaike criterion or the EDC ll34l are powerful mathematical 
tools which may be used to evaluate the sparsity level K. They 
all follow a similar pattern, which is to minimize 

ITC(cr, K) = £(<t, K) + K ■ (2(M + 1) - K) ■ P(M), 

where C is the log-likelihood function based on cr the singular 
values of T, and K an estimate of the sparsity level. The 
term P is a penalty growing at rate between 0(1) and o(N) 
depending on which one of the criteria is used. 

Their performances are well understood [051 . but the evalua- 
tion of the likelihood function requires to compute the product 



M 



IK-1 



[ al = det(T^T) / [] * 



2 

t; ■ 



fc = 



which has an algorithmic cost superior to the Lanczos 
algorithm itself. 

Also, the argument used in l36l to show consistency of a 
partial evaluation of ITC(er, K) cannot be used in our setup 
because the asymptotic distribution of the noise matrix spec- 
trum^] is extremely different. As the number of pilots increases, 
the probability measure of the noise spectrum concentrates 
071 . but its support is unbounded. This is a major difference 
with the setup considered in l36l . ||29l where the noise power 
spectrum concentrates around the noise variance . Formally 
we can say 

Proposition 1: Let T be an (M + l)P x (M + 1) matrix 
composed of fixed and finite number P of stacked Toeplitz 
blocks T p of square dimensions (M + 1) x (M + 1) as in 
dS). Assuming the generators of each block are sequences of 
white Gaussian noise 

(T H T/{M + 1)) = O(logM), 

i.e. even with proper normalization, the spectral norm of an 
all-noise matrix diverges. 

Proof: The matrix norm induced by the euclidian vector 



spectrum" shall be understood as the SVD spectrum of the matrix and 
not the noise Fourier power spectrum when matrices are concerned. 
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norm on <C.( M+1 ) p is submultiplicative, therefore 



\\Ti II =S 



[t o] 



«; i- llTii. 



(10) 



We then use a result of Meckes 1381 on the spectral norm 
of square Toeplitz matrices with independently distributed 
subgaussian generating coefficients 



||Ti|| ~ O^MlogM 
Using the triangle inequality and Meckes' result 

\\T H T\\ < ^ \\T p H T p \\ ~ PO(MlogM), 
p 

which together with ([TUt and (fTTT i proves the proposition 



(11) 



It indicates that for pure AWGN measurements the spectrum 
of the matrix is unbounded, and one cannot expect the signal 
and the noise spectrum to separate nicely. Without a proper 
approximation, the computation of IIm=/f+i °f n would have 
a cost of 0(PN 2 log(iV)) in general], driving the complexity 
of the algorithm. 

2) The Partial Effective Rank: The effective rank, is a 
matrix functional introduced by Roy [12] which may be seen 
as a "convexification" of the rank. 

Definition 1: Let A be a matrix with singular values a = 
[ci, <tj\/] t in decreasing order, and singular values 

distribution 



Pm = c m /||tr|[i , m 
The Effective Rank of A is 



1. 



M. 



erank(A) 



where H is the entropy of the singular values distribution 



A I 



HijPl, . . . ,p M ) = - J] Pm. l0g e p TO . 

m=l 

For elementary properties of the effective rank, we defer to 

urn 

We can now introduce the partial effective rank 
Definition 2: For A and H as in Definition Q] and 



PK,k = Ofc/llfi^ 111 j k = 1, 
the Partial Effective Rank (PER) is 



K s: M, 



PERk(A) = e nipK - u -' VK ' K) . 

The PER verifies 
Proposition 2: 

< PERjf+i(A) - PERk(A) s: 1. 

The lower bound is reached if and only if o~k+i = and 
the upper bound 1 if and only if 



'Using a block-Levinson recursion to estimate the determinant T, it maybe 
brought down to 0((PN) 2 ), but it was not verified. 



H 




b) 



A posteriori optimal K 

■ - - Heuristic based on PER (median) 



true value, K = 7 



underestimate 




dB 

EQgain 
10 



-5 
SNR [dB] 

O 



■ FRI + PER (L=5) 
FRI + K=7 (a priori knowledge) 
FRI + Oracle (upper-bound) 

! No equalization (equi-SNR) 




input SNR [dB] 

Fig. 6. Simulation on a signal with 7 components. The PER curves in (a) 
show a clear inflection at K = 7. As the SNR diminishes, the inflection occurs 
at lower values of K and completely disappears at SNRs < — lOdB. The 
graph b) empirically shows that the PER indicates the number of components 
significantly above noise level which can be reliably estimated: the value of K 
it predicts follows (in median) the one minimizing the channel estimation error 
(energy of the residual). Finally, in c) the estimation performances obtained 
with the PER closely follow the one obtained with an oracle choosing a 
posteriori the optimal number of components. We see also that below — 5dB 
the underestimation of K by the PER is beneficial compared to the true 
value 7. At high SNR, the PER sometimes overestimate the number of paths, 
mitigating the performances. 



The increase of the partial effective rank with K reflects 
how "significant" is the K th principal component of A 
compared to the previous ones. 

Fig.[6]provides empirical evidences that the evolution of the 
PER during the Lanczos process provides a suitable criterion 
to estimate K. 
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We settle on a very simple heuristic to estimate K based 
on the PER. We choose the smallest K such that 

PERr-(T) - i (PER A - +1 (T) + • • • + PER K+L (T)) ^ 0, 

which is a very simple "positive slope" detector. It introduces 
a small overhead: K + L dimensions are required to decided 
if the signal space is of dimension K. 

Despite its simplicity and its heuristic motivations, this 
criterion proved to be robust in practice as shown in Fig. |6] 

Too many components and/or large modelization error in- 
dicate the channels are not sparse. 

V. Numerical results 

A. accuracy: FRI vs CS vs lowpass 

1) Setup: We use the "FTW rural" dataset to compare 
aforementioned algorithms on the SCS channel estimation 
problem. The transmitter is a mobile single antenna device^. 
The receiver is a "base-station" with P = 8 receiving antennas. 
A total of 251 frames are transmitted on a 120MHz wide 
channel with a 2GHz carrier frequency. The transmission has 
a high SNR, so we consider the samples to be the ground truth 
(infinite SNR). Various SNR conditions are simulated with the 
addition of AWGN to the samples. 

The DFT pilots are uniformly laid-out every D = 3 DFT 
bin. The estimated channel is used to demodulate 4-PSK 
coded data symbols occupying the left over DFT bins, and 
the obtained Symbol Error Rate is the quality metric used to 
benchmark the estimation. 

2) Interpretation of the results: From Fig. [7] we may 
conclude that 

• The channels do not exactly fit the SCS model, therefore 
the modelization error becomes larger than the noise at 
high SNR 

• The SCS property helps in lowering the symbol error rate 
at medium to low SNR (below dB) 

• The "sparsity" model assumed by FRI (few reflections) 
match the field measurements better than the one assumed 
by CS (few non-0 coefficients) as seen in Fig. [8] 

• Any algorithm exploiting sparsity must be "introspec- 
tive", i.e. it must detect when sparsity does not occur, 
and fall-back to a more traditional method whenever it 
happens. It is exemplified by the stroll through the tunnel. 

B. Computational benchmark 

Both FRI-PERK0 and RA-ORMP were implemented in 
MATLAB0 Because of the decimation in frequency and 
the limitation of the delay-spread in time, the projection in 
RA-ORMP cannot be realized with simple FFTs, results are 
reported in Fig. |9](c-d). 

8 Only the first Tx antenna is used. 

'Uses the ARPACK library 1391 . Since the size of the Krylov subspace 
dimension must be fixed beforehand, it does not has the ability to stop before 
K max is reached. 

10 v. R2011b for MacOS X, running on a 1,8GHz Intel Core 17 processor 
and enough 1.3 GHz DDR3 memory 



If this is not the case, we implemented a fast version of 
it and reported results in Fig. |9](a-b). Note that such a setup 
does not apply to most standards HI, 0. 

Definitive conclusions for hardware implementations cannot 
be drawn from these experiments, however we can see that the 
low asymptotic complexity of FRI-PERK is not misleading 
since it provides improvements for a number of pilots which 
are encountered in practice JT], Q (higher bandwidth modes). 
For small numbers of pilots, the direct implementation maybe 
faster, though we did not used the tracking capacity of Lanczos 
algorithm used by choosing an initial vector lying in the signal 
space obtained at the previous step J36). 

VI. Conclusion 

In this study, we addressed computational and robustness is- 
sues of FRI techniques applied to channel estimation. The tests 
conducted on field measurements show the SCS assumption is 
relevant at medium to low SNR where the noise power exceeds 
the one of model mismatch, and can be used to substantially 
lower the SER. 

The PER criterion used to estimate the sparsity level gives 
satisfactory results, but an upcoming thorough analytical study 
is required. 

Comparison with discrete techniques from compressed sens- 
ing indicate a trade-off between speed and accuracy compared 
to FRI-PERK. Using a finer discretization (frames) allows one 
to vary this trade-off, and shall be tested in the future. We 
believe that with a proper discretization similar speed/accuracy 
results shall be met. 
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a) Average over time (all snapshots) b) Average over SNR (— 15dB to OdB) 
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Fig. 7. From a) we conclude that sparse recovery lowers the SER compared to non-sparse recovery below OdB of SNR. If we look at the SER over time in 
b), we see that FRI-PERK is robust in the sense that if the input signal is not sparse, its performs approximately as well as a non-sparse recovery. 
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Fig. 8. This figure compares the estimation result of FRI-PERK and RA-ORMP. The input signal is the first frame received at the first antenna corrupted 
with AWGN to obtain — 5dB of SNR. Panel (a) shows the measurements and the original CIR. Panel (b) shows a portion of interest of the CIR estimated with 
FRI-PERK. The PER criterion estimates K = 3, and by visual inspection the three signal components found match the largest ones of the original signal. 
Panel (c) shows the result obtained with RA-ORMP. The discrete sparsity model causes the estimation to be more sensitive to uncorrelated noise, as spurious 
spikes contributed by noise are estimated as signal components. 
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Fig. 9. A benchmark is run for pilot sequences of length 2M + 1 = 101 to 2M + 1 = 1001. The top row compares FRI-PERK with FRI-PER — same 
algorithm using a full SVD instead of Krylov subspace projection — and a fast implementation of RA-ORMP using FFTs, which is possible since the pilots 
are contiguous in frequency (D = 1). The bottom row follows the same procedure, but with scattered pilots (D = 3) and a delay-spread smaller than the 
frame length. There is no straightforward fast implementation of RA-ORMP in this case. 
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